A Deductive Question Answering System on Relational Data Bases

نویسنده

  • Koichi Furukawa
چکیده

T h i s paper d e s c r i b e s a new f o r m a l i z a t i o n of a d e d u c t i v e q u e s t i o n a n s w e r i n g system on a r e l a t i o n a l d a t a base u s i n g a theorem p r o v i n g t e c h n i q u e . A theorem p r o v i n g p rocedu re f o r a f i n i t e domain i s i n v e s t i g a t e d and a d i r e c t p r o o f p r o c e d u r e based o n s u b s t i t u t i o n s o f e q u i v a l e n t f o r m u l a s w h i c h employs t h e b r e a d t h f i r s t sea rch i s i n t o r u d c e d . The sea rch s t r a t e g y i s t h e n expanded t o se t o p e r a t i o n s o f t h e r e l a t i o n a l a l g e b r a wh ich a re i n c o r p o r a t e d i n t o t h e p r o o f p rocedu re i n o r d e r t o i n c r e a s e t h e d a t a base search e f f i c i e n c y . V i r t u a l r e l a t i o n s a r c r e a l i z e d b y means o f i n t r o d u c i n g s e v e r a l ax ioms and u t i l i z i n g t h e d e d u c t i v e c a p a b i l i t y o f t he l o g i c a l sys tem. F u r t h e r m o r e , a c o n d i t i o n a l domain is, i n t r o d u c e d as one o f t h e v i r t u a l domains and i s used to g i v e a r e l a t i o n a l v i e w t o a pseudo r e l a t i o n a l d a t a base w h i c h can r e p r e s e n t e x c e p t i o n a l cases u s i n g some 1i nk i n f o r m a t i o n . A que ry t r a n s f o r m a t i o n system c a l l e d DBAP (Da ta Base Access P l a n n e r ) wh i ch embodies t h o s e f e a t u r e s i s implemented i n QJJSP. 1 . I n t r o d u c t i o n Many r e s e a r c h g roups i n the a r t i f i c i a l i n t e l l i g e n c e f i e l d have been c o n c e n t r a t i n g t h e i r e f f o r t s on how to r e p r e s e n t knowledge1 and how to p e r f o r m l o g i c a l i n f e r e n c e a n d / o r common sense r e a s o n i n g . The knowledge d a t a bases a r e o r g a n i z e d i n ve ry c o m p l i c a t e d ways i n o r d e r t o r e a l i z e those v e r y h i g h l e v e l f u n c t i o n s . These s t r u c t u r a l and o p e r a t i o n a l c o m p l e x i t i e s have been p r e v e n t i n g us f rom expand ing them to ve ry l a r g e knowledge d a t a bases . On t h e o t h e r h a n d , t h e r e have been many p r o j e c t s t o d e v e l o p v e r y l a r g e commerc ia l d a t a bases i n t h e d a t a base r e s e a r c h a r e a . T h i s k i n d o f d a t a base i s assumed to be used i n a r e l a t i v e l y s i m p l e manner and c o n s e q u e n t l y has s i m p l e s t r u c t u r e s : . E f f i c i e n t sea rch a l g o r i t h m s f o r such s i m p l e s t r u c t u r e s have been deve loped e x t e n s i v e l y and some s p e c i a l purpose hardware systems w i t h p a r a l l e l s e a r c h i n g c a p a b i l i t y a r e b e i n g deve loped i n many p l a c e s . Our c u r r e n t r e s e a r c h g o a l i s t o combine t h e s e two s e p a r a t e e f f o r t s t o b u i l d u p a v e r y l a r g e d a t a base w i t h t h e d e d u c t i v e c a p a b i l i t y [ 8 ] , [ l l ] . Codd, E . F . [ 2 ] i n t r o d u c e d an a l g o r i t h m t o c o n v e r t any que ry w r i t t e n i n a r e l a t i o n a l sub language t o a sequence o f r e l a t i o n a l a l g e b r a i c , o p e r a t i o n s i n o r d e r t o show t h e r e l a t i o n a l comp le teness o f t h e r e l a t i o n a l a l g e b r a . H i s a l g o r i t h m can be c o n s i d e r e d as a f o r m a l q u e s t i o n a n s w e r i n g (QA) p r o c e d u r e on a r e l a t i o n a l d a t a b a s e . On t h e o t h e r h a n d , G reen , C. and R a p h a e l , B. \k] f o r m a l i z e d a d e d u c t i v e QA system based o n f i r s t o r d e r l o g i c . The e s s e n t i a l p o i n t o V t h e i r f o r m a l i s m i s t h a t knowledge i s r e p r e s e n t e d by a s e t o f ax ioms and t h e answer o f t h e q u e s t i o n i s e x t r a c t e d f rom t h e r e f u t a t i o n p r o o f o f t h a t q u e s t i o n . I n t h i s , p a p e r , t h e s e two f o r m a l i s m s a re combined by i n t r o d u c i n g a p r o o f p r o c e d u r e f o r a f i n i t e s e t , where l o g i c a l e x p r e s s i o n s a re i n t e r p r e t e d as s e t o p e r a t i o n s on t h e s e t . A p r o o f p r o c e d u r e f o r q u e r i e s wh i ch r e q u i r e a l l answers s a t i s f y i n g t h e g i v e n s p e c i f i c o. t i o n i s p r e s e n t e d. I t i s a d i r e c t p r o o f p r o c e d u r e based o n s u b s t i t u t i o n s o f e q u i v a l e n t f o r m u l a s . As an i n t e r m e d i a t e r e s u l t o f t h e d i r e c t p r o o f , t he system g e n e r a t e s an access p l a n t o t h e d a t a base , and t h e n t h e p l a n i s execu ted t o ge t t h e a l l answers s a t i s f y i n g t h e s p e c i f i c a t i o n . The se t o p e r a t i o n s o f t h e r e l a t i o n a l a l g e b r a a re c o n s i d e r e d as expanded n o t i o n s o f t h e b r e a d t h f i r s t sea rch s t r a t e g y and a re i n c o r p o r a t e d i n t o t h e p r o o f p r o c e d u r e t o exp ress t h e accees p l a n . S t o n e b r a k e r , M . [ 1 0 ] i n t r o d u c e d the n o t i o n o f v iews (we c a l l them v i r t u a l r e l a t i o n s ) i n o r d e r t o p r o v i d e use rs w i t h t h e d e d u c t i v e c a p a b i l i t y , and r e a l i z e d them by means o f query m o d i f i c a t i o n . I n t h i s p a p e r , v i r t u a l r e l a t i o n s a r e c o n s i d e r e d t o p r o v i d e a semant i c model o f t he base r e l a t i o n s and a re d e f i n e d by a se t o f non -g round ax i oms . The query m o d i f i c a t i o n p rocess can be c o n s i d e r e d as s u b s t i t u t i o n p rocess o f a f o r m u l a by an e q u i v a l e n t f o r m u l a , t h e r u l e o f wh i ch i s g i v e n b y t h e a s s o c i a t e d a x i o m . An ax iom c a l l e d a c o n d i t i o n a l domain ax iom i s p a r t i c u l a r l y i n t e r e s t i n g . I t i s used to g i v e a r e l a t i o n a l v iew to a pseudo r e l a t i o n a l d a t a base wh ich can r e p r e s e n t e x c e p t i o n a l cases u s i n g some l i n k i n f o r m a t i o n . I n a d d i t i o n , some c o n s i d e r a t i o n s o n d e l e t i o n o f r e d u n d a n c i e s w i l l b e p r e s e n t e d . O p t i m i z a t i o n o f t h e access p l a n w i l l a l s o b e e o n e i d e r e d . The imp lemented query t r a n s f o r m a t i o n system DBAP w i l l b e b r i e f l y e x p l a i n e d . I n t h e l a s t s e c t i o n , t h e c o n c l u s i o n and some f u t u r e r e s e a r c h works to be done w i l l b e d e s c r i b e d . 2. Fo r m a l i z at i on G e n e r a l l y , a f o r m a l QA system c o n s i s t s of a s e t o f ax ioms and a theo rem p r o v e r to ge t answers f o r a g i v e n q u e r y . F i g . 1 shows t h e c o n f i g u r a t i o n o f ou r sys tem i n t e rms o f t h e f o r m a l i s m . I n a f o rma l s y s t e m , each datum i n t h e d a t a base has to be exp ressed by a g round c l a u s e (a c l a u s e w h i c h does n o t c o n t a i n any v a r i a b l e s ) . There a r e two t y p i c a l r e p r e s e n t a t i o n s : n a m e l y , t h e t u p l e w i s e r e p r e s e n t a t i o n and t h e d o m a i n w i s e N a t u r a l Lanr :uar f» -3 : Furukawa 50 p u t t i n g a p r e f i x symbol and read the expression as 'F ind a i l ?x such tha t ( i ) . . . ' . The i n ten t i ona l f i l e consis ts of non-ground axioms which def ine users ' views or v i r t u a l r e l a t i o n s . The ob jec t i ve of in t roducing users ' views is to keep the query language independent of the l o g i c a l s t ruc tu re of the r e l a t i o n a l data base. Assume that we have a r e l a t i o n a l data hase which consists of the fo l l ow ing base r e l a t i o n s : EMP(NAME, DNAME SAL) DEPT(WAME MGR) where the domais DNAME in EMP and NAME in DEPT are both the set of departments. Assume also that a user wants to def ine a v i r t u a l r e a l ! i o n VEMP(NAME DNAME SAL MGR). In the v i r t u a l r e l a t i o n VEMP, the domain MGR belongs to the employee r e l a t i o n , but in fact i t belongs to the department r e l a t i o n . The domain MGR is considered to have been t ransfered from the department r e l a t i o n to the employee r e l a t i o n , and we c a l l t h i s k ind of v i r t u a l domain a t r a n s i t i v e domain. in terms of the VEMP r e l a t i o n , the fac t tha t the manager of an employee i is x is expressed as VEMP.MGR(i,x). The QA system has to transform t h i s expression to the f o l l ow ing conjunct ion of l i t e r a l s on the base r e l a t i o n s : Let us consider the query "Who is the manager of Mr. SMITH ?" . In terms of the v i r t u a l r e l a t i o n VEMP, t h i s quest ion is l o g i c a l l y expressed by N a t u r a l L a n r u a £ 0 3 : Furukawa 60 (?x)(i)(VEMP.NAME(i, 'SMITH' A VEMP.MGR(i,?x)). (2 ) By subst i tut ing the second term in (2) by the righthand expression of the equivalence sign = in ( l ) , we obtain the fol lowing expression: ( a ?x) (3 i ) (a j ) (3y) (VEMP.NAME(i,'SMITH' ) A EMP. DMME( i , y) A DEPT.NAME(j,y) A DEFT.MGR( ,1 ,?x )) . So fo r , we have obtained the expression in terms of the base relat ions except the underlined l i t e r a l . This l i t e r a l is transformed to the corresponding base re la t ion l i t e r a l by the fol lowing axiom: (i)(x)(VEMP.NAME(i,x) i EMP.NAME(i,x)). This type of axiom is called a simple domain axiom, and a query which does not include any v i r t u a l re la t ion l i t e r a l s is called a base query. It is obvious that any query which is specified in terras of v i r t ua l relat ions is translated to an equivalent base query by logical inference. However, the trnasfromation by the resolution rule which is based on the modus ponens is insuf f i c ien t if we want to get a l l answers which sat isfy the given speci f icat ion. We can prove it easi ly . Denote a query by F[?x] and the required answers by {?x| F[?x]} . If we obtain a base query G[YxJ by applying the resolution r ul e s , then G [ ? x ] ~ F [ ? x ] . There fore, {?x| G[?xll £ {?x| F f?x ] l , where the equality holds only if G[?x] ^ F[VxJ. So fa r , these transformations can be realized by the query modif ication technique [10]. As far as the control structure is concerned, it is equivalent to the input resolution in the GL-resolution which is known to be val id only for a horn set [13] . But there exist more complicated axioms which require the whole inference capabi l i ty including the ancester resolut ion. We w i l l introduce a few such axioms later on. A v i r t ua l domain can be defined in terms of other predefined v i r t ua l domains. The axioms for such domains transform a l i t e r a l to a conjunction of l i t e r a l s some of which are not the base r e1at ion 1i t er a ls . In this paper, we consider only ex is tent ia l ly quanti f ied queries. It is easily shown that the resul t ing base queries after applying the transformations are also only exs is tent ia l ly quant i f ied. Therefore, we further simpli fy the notation for queries by omitt ing a l l quant i f iers . 3. Deletion of Redundancies A base query which is obtained so far may have some redundancies. Let us consider the base re la t ions: EMP(NAME DNAME); DEPT(NAME MGR LOG), and the v i r t ua l re la t ions: VEMP(NAME DNAME LOG); VDEPT(NAME MGR). Note that the LOG domain is t rans i t i ve . Assume that the following query is given: t inquires the manager of the department y to which the employee i located at 1*1*2 belongs. The expression (3) is transformed to the fol lowing

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Evaluation of Logic Programs for Querying Data Integration Systems

Many data integration systems provide transparent access to heterogeneous data sources through a unified view of all data in terms of a global schema, which may be equipped with integrity constraints on the data. Since these constraints might be violated by the data retrieved from the sources, methods for handling such a situation are needed. To this end, recent approaches model query answering...

متن کامل

On Closed World Data Bases / 119 on Closed World Data Bases

Deductive question-answering systems generally evaluate queries under one of two possible assumptions which we in this paper refer to as the open and closed world assumptions. The open world assumption corresponds to the usual first order approach to query evaluation: Given a data base DB and a query Q, the only answers to Q are those which obtain from proofs of Q given DB as hypotheses. Under ...

متن کامل

Formalizing and Studying Dialectical Explanations in Inconsistent Knowledge Bases. (Formalisation et Etude des Explications Dialectiques dans les Bases de Connaissances Incohérentes)

Knowledge bases are deductive databases where the machinery of logic is used to represent domain-specific and general-purpose knowledge over existing data. In the existential rules framework, a knowledge base is composed of two layers: the data layer which represents the factual knowledge, and the ontological layer that incorporates rules of deduction and negative constraints. The main reasonin...

متن کامل

Deductive Question Answering from Multiple Resources

Questions in natural language are answered by consulting multiple sources and inferring answers from information they provide. An automated deduction system, equipped with an axiomatic application-domain theory, serves as the coordinator for the process. Sources include data bases, Web pages, programs, and unstructured text. Answers may contain text or visualizations. Although the approach is d...

متن کامل

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

متن کامل

Learning an Optimal Sequence of Questions for the Disambiguation of Queries over Structured Data

Intelligent systems interacting with users often need to relate ambiguous natural language phrases to formal entities which can be further processed. This work strives for learning an optimal sequence of disambiguation questions asked by an agent in order to achieve a perfect interactive disambiguation, setting itself off against previous work on interactive and adaptive dialogue systems for di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1977